Classifier performances for deriving replicates

For each disease, we derive replicates of the mapping of RCTs across diseases after simulating what would have been the mapping of RCTs within regions if the misclassification of RCTs towards groups of diseases was corrected, given the sensitivities and specificities of the classifier to identify each group of disease.

To estimate the performances of the classifier for each group of diseases, we dispose a test set with 2,763 trials manually classified towards the 27-class grouping of diseases used in this work. The test set is described at Atal et al. BMC Bioinformatics 2016.

This script is for calculating sensitivity and specificity of the classifier to identify the disease and other studies relevant to the burden of diseases, and the number of success and number of trials to derive beta distributions

1. Sensitivities and specificities based on test set



In [1]:

    
test_set <- read.table("/media/igna/Elements/HotelDieu/Cochrane/MetaMapBurden/Paper_classifier/NCT_data_classified_to28cats.txt")
dim(test_set)



In [2]:

    
#We supress injuries from trials concerning the burden of diseases
test_set$GBDnp <- sapply(strsplit(as.character(test_set$GBDnp),"&&"),function(x){paste(x[x!="28"],collapse="&")})
test_set$GBD28 <- sapply(strsplit(as.character(test_set$GBD28),"&"),function(x){paste(x[x!="28"],collapse="&")})



In [3]:

    
tst <- strsplit(test_set$GBDnp,"&")
alg <- strsplit(test_set$GBD28,"&")
tst <- lapply(tst,as.numeric)
alg <- lapply(alg,as.numeric)



In [4]:

    
source('Evaluation_metrics.R')



In [5]:

    
dis <- 1:27
Mgbd <- read.table("/home/igna/Desktop/Programs GBD/Classifier_Trial_GBD/Databases/Taxonomy_DL/GBD_data/GBD_ICD.txt")



In [6]:

    
#For each category in 1:27, TP, TN, FP and FN of finding the disease and of finding another disease
set.seed(7212)

dis <- as.character(1:27)

PERF_F  <- data.frame()
for(i in dis){
    ALG <- lapply(alg,function(x){rs <- c()
                                  if(i%in%x) rs <- c(1)
                                  if(sum(setdiff(dis,i)%in%x)!=0) rs <- c(rs,2)
                                  return(rs)
                                      })

    DT <- lapply(tst,function(x){rs <- c()
                                if(i%in%x) rs <- c(1)
                                if(sum(setdiff(dis,i)%in%x)!=0) rs <- c(rs,2)
                                return(rs)
                                    })

    CM <- conf_matrix(ALG,DT,c(1,2))

    PERF <- c(CM[1,],CM[2,])
    PERF_F <- rbind(PERF_F,PERF)
}



In [7]:

    
#We add performances of classifier to identify trials relevant to the burden of diseases
    ALG <- lapply(alg,length)
    DT <- lapply(tst,length)
    CM <- conf_matrix(ALG,DT,1)
    PERF <- c(CM,rep(NA,4))
    PERF_F <- rbind(PERF_F,PERF)



In [8]:

    
PERF_F <- data.frame(PERF_F)
names(PERF_F) <- paste(rep(c("TP","FP","TN","FN"),2),rep(c("_Dis","_Oth"),each=4),sep="")



In [9]:

    
PERF_F$dis <- c(dis,0)
PERF_F$GBD <- c(as.character(Mgbd$cause_name[-28]),"All")



In [10]:

    
PERF_F <- PERF_F[,c(9,10,1:8)]



In [11]:

    
PERF_F









    





dis GBD TP_Dis FP_Dis TN_Dis FN_Dis TP_Oth FP_Oth TN_Oth FN_Oth

	1 1           Tuberculosis 14          2           2745        2           2142        204         267         150         
	2 2       HIV/AIDS 86      7       2659    11      2072    214     333     144     
	3 3                                                                                       Diarrhea, lower respiratory infections, meningitis, and other common infectious diseases 40                                                                                      21                                                                                      2693                                                                                    9                                                                                       2113                                                                                    207                                                                                     299                                                                                     144                                                                                     
	4 4      Malaria 14     1      2748   0      2142   204    267    150    
	5 5                                            Neglected tropical diseases excluding malaria 6                                            0                                            2756                                         1                                            2150                                         203                                          261                                          149                                          
	6 6                 Maternal disorders 17                5                 2715              26                2130              210               289               134               
	7 7                 Neonatal disorders 4                 7                 2746              6                 2148              205               262               148               
	8 8                       Nutritional deficiencies 11                      15                      2732                    5                       2140                    201                     272                     150                     
	9 9                                          Sexually transmitted diseases excluding HIV 0                                          3                                          2759                                       1                                          2155                                       203                                        255                                        150                                        
	10 10       Hepatitis 14       4        2742     3        2141     208      262      152      
	11 11     Leprosy 2      1      2760   0      2154   203    256    150    
	12 12       Neoplasms 933      42       1763     25       1213     214      1198     138      
	13 13                                     Cardiovascular and circulatory diseases 178                                    60                                     2468                                   57                                     1951                                   217                                    466                                    129                                    
	14 14                          Chronic respiratory diseases 76                          17                          2665                        5                           2074                        209                         328                         152                         
	15 15                    Cirrhosis of the liver 19                    17                    2723                  4                     2133                  211                   267                   152                   
	16 16                                   Digestive diseases (except cirrhosis) 24                                   28                                   2703                                 8                                    2129                                 199                                  289                                  146                                  
	17 17                    Neurological disorders 79                    40                    2630                  14                    2060                  211                   339                   153                   
	18 18                             Mental and behavioral disorders 134                            33                             2587                           9                              2014                           198                            402                            149                            
	19 19                                             Diabetes, urinary diseases and male infertility 196                                            63                                             2458                                           46                                             1930                                           213                                            473                                            147                                            
	20 20                    Gynecological diseases 9                     8                     2744                  2                     2146                  206                   262                   149                   
	21 21                                      Hemoglobinopathies and hemolytic anemias 10                                      4                                       2743                                    6                                       2143                                    203                                     270                                     147                                     
	22 22                       Musculoskeletal disorders 100                      40                       2610                     13                       2046                     188                      382                      147                      
	23 23                  Congenital anomalies 22                  34                  2706                1                   2121                205                 275                 162                 
	24 24                            Skin and subcutaneous diseases 18                            24                            2717                          4                             2134                          198                           281                           150                           
	25 25                  Sense organ diseases 52                  40                  2667                4                   2085                190                 322                 166                 
	26 26            Oral disorders 3             4             2751          5             2150          207           258           148           
	27 27                          Sudden infant death syndrome 0                           0                           2763                        0                           2156                        203                         254                         150                         
	28 0   All 2022 165 314 262 NA  NA  NA  NA



In [12]:

    
write.csv(PERF_F,'Tables/Performances_per_27disease_data.csv')



In [ ]:

	dis	GBD	TP_Dis	FP_Dis	TN_Dis	FN_Dis	TP_Oth	FP_Oth	TN_Oth	FN_Oth
1	1	Tuberculosis	14	2	2745	2	2142	204	267	150
2	2	HIV/AIDS	86	7	2659	11	2072	214	333	144
3	3	Diarrhea, lower respiratory infections, meningitis, and other common infectious diseases	40	21	2693	9	2113	207	299	144
4	4	Malaria	14	1	2748	0	2142	204	267	150
5	5	Neglected tropical diseases excluding malaria	6	0	2756	1	2150	203	261	149
6	6	Maternal disorders	17	5	2715	26	2130	210	289	134
7	7	Neonatal disorders	4	7	2746	6	2148	205	262	148
8	8	Nutritional deficiencies	11	15	2732	5	2140	201	272	150
9	9	Sexually transmitted diseases excluding HIV	0	3	2759	1	2155	203	255	150
10	10	Hepatitis	14	4	2742	3	2141	208	262	152
11	11	Leprosy	2	1	2760	0	2154	203	256	150
12	12	Neoplasms	933	42	1763	25	1213	214	1198	138
13	13	Cardiovascular and circulatory diseases	178	60	2468	57	1951	217	466	129
14	14	Chronic respiratory diseases	76	17	2665	5	2074	209	328	152
15	15	Cirrhosis of the liver	19	17	2723	4	2133	211	267	152
16	16	Digestive diseases (except cirrhosis)	24	28	2703	8	2129	199	289	146
17	17	Neurological disorders	79	40	2630	14	2060	211	339	153
18	18	Mental and behavioral disorders	134	33	2587	9	2014	198	402	149
19	19	Diabetes, urinary diseases and male infertility	196	63	2458	46	1930	213	473	147
20	20	Gynecological diseases	9	8	2744	2	2146	206	262	149
21	21	Hemoglobinopathies and hemolytic anemias	10	4	2743	6	2143	203	270	147
22	22	Musculoskeletal disorders	100	40	2610	13	2046	188	382	147
23	23	Congenital anomalies	22	34	2706	1	2121	205	275	162
24	24	Skin and subcutaneous diseases	18	24	2717	4	2134	198	281	150
25	25	Sense organ diseases	52	40	2667	4	2085	190	322	166
26	26	Oral disorders	3	4	2751	5	2150	207	258	148
27	27	Sudden infant death syndrome	0	0	2763	0	2156	203	254	150
28	0	All	2022	165	314	262	NA	NA	NA	NA